feat(utility): constexpr string-parse primitives (Phase A of #835)#838
Conversation
Adds include/sw/universal/utility/string_parse.hpp -- a shared foundation of constexpr-friendly primitives that the number systems will adopt in later phases of issue #835. Why --- Today's parse() implementations are reimplemented per type (posit, cfloat, integer, fixpnt, lns, ...), each with its own scanner. Format coverage has diverged (lns refuses decimal value strings while fixpnt accepts them; posit accepts a custom "nbits.esXhex" form; etc.). Without a shared foundation, every type that adopts string parsing will continue this divergence. This file is the foundation. All primitives operate on std::string_view and are constexpr -- they can be called from constexpr ctors once the number systems are migrated in Phase B. What's in the header -------------------- sw::universal::string_parse namespace: enum number_base { binary, octal, decimal, hex, unknown }; prefix_scan scan_prefix(string_view); // 0b/0o/0x -> base + body sign_scan scan_sign(string_view); // +/- detection bit_pattern_result parse_binary/parse_octal/parse_hex(string_view); // MSB-first uint64_t with overflow decimal_float_scan scan_decimal_float(string_view); // [-+]?int.frac[eE][-+]?exp // returns string_views into input // plus signed int32 exponent Plus character classifiers (is_decimal_digit, is_hex_digit, ...) and a hex_digit_value helper. Phase A explicitly does NOT perform value reconstruction (turning digit strings into a number type's bit representation). That step depends on the target type's precision and rounding rules and so belongs in each type's own parse() in Phase B. Tests ----- static/utility/test_string_parse.cpp covers all primitives with 57 test assertions plus 8 static_assert smoke tests proving constexpr. - scan_prefix: 0b/0B/0o/0O/0x/0X case variants, decimal default, leading-0-without-prefix, empty input - scan_sign: '+', '-', no sign, empty, double-sign (first only) - parse_binary/octal/hex: valid input, partial-stop-at-invalid-char, 64-bit boundary (no overflow), 65/17-digit overflow detection, empty -> invalid - scan_decimal_float: integer-only, fraction-only, integer+fraction, positive/negative exponent, mixed signs, malformed (bare dot, trailing garbage, "1e" no exp digits), 50-digit fraction Passes 57/57 on gcc and clang. This commit closes nothing in #835 by itself. Phase B (posit/cfloat/ integer/fixpnt uplift onto this foundation) will reference the same issue and start consuming these primitives. Relates to #835 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a new header ChangesString Parsing Primitives
CI Conventional Commits
🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
The PR title for #838 uses feat(utility): which the conventional-commits check rejected because "utility" wasn't in the scope list. The library has an actual include/sw/universal/utility/ directory for shared infrastructure (bit_cast, find_msb, convert_to, ...), so the scope is legitimate and will recur for any future utility additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@static/utility/test_string_parse.cpp`:
- Around line 254-334: Add tests in test_scan_decimal_float to cover int32
boundary exponents because scan_decimal_float delegates exponent parsing to
detail::parse_int32; specifically, add cases asserting that "1e2147483647"
yields valid with exp10 == INT32_MAX, "1e-2147483648" yields valid with exp10 ==
INT32_MIN, and that "1e2147483648" is rejected (invalid) to verify overflow
handling in scan_decimal_float/detail::parse_int32.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2246d01e-9a76-4a77-bc75-6f967ceedf31
📒 Files selected for processing (2)
include/sw/universal/utility/string_parse.hppstatic/utility/test_string_parse.cpp
Addresses the round-1 CodeRabbit review of #838: add tests covering the int32 exponent boundary in scan_decimal_float. Writing the requested tests surfaced a real bug: parse_int32 used a single bound v > 2147483647 to reject overflow, which (correctly) rejects positive values above INT32_MAX but (incorrectly) also rejects the well-formed representation of INT32_MIN (where |INT32_MIN| = 2^31 = 2147483648). For "1e-2147483648" the accumulator reaches 2147483648 before the sign is applied, so the check fires too early. Fix: track sign first and pick the bound conditionally positive: at most INT32_MAX = 2147483647 negative: at most |INT32_MIN| = 2147483648 The int64 -> int32 cast at the end yields INT32_MIN for v = 2147483648 under C++20's mandated two's-complement. Added tests: - "1e2147483647" -> valid, exp10 == INT32_MAX - "1e-2147483648" -> valid, exp10 == INT32_MIN - "1e2147483648" -> invalid (overflow on positive side) 60/60 tests pass on gcc and clang. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coverage Report for CI Build 25959866834Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage increased (+0.007%) to 83.953%Details
Uncovered ChangesNo uncovered changes found. Coverage Regressions8 previously-covered lines in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
…hase B1 of #835) (#839) * feat(integer, fixpnt): migrate/implement parse() onto string_parse (Phase B1 of #835) Phase B1 of issue #835: bring `integer` and `fixpnt` onto the shared constexpr string-parsing foundation that landed in Phase A (#838). integer ------- Replaces the std::regex + std::map + reverse-iteration parser body in parse(const std::string&, integer&) with a clean MSB-first scan that delegates prefix/sign detection and character classification to the `sw::universal::string_parse` primitives. Notable functional changes: - The previous octal branch was a `// TODO` that always returned false after detecting C-style leading-zero octal. Replaced with a real implementation gated on the explicit `0o` / `0O` prefix. - Binary (`0b` / `0B`) is newly supported. - Hex still accepts apostrophe as a digit separator (e.g. `0xDE'AD'). - Decimal accepts a single optional leading `+` or `-`. - Drops the `<regex>` and `<map>` includes from integer_impl.hpp. fixpnt ------ `fixpnt::parse()` was previously forward-declared (fixpnt_fwd.hpp:22) and called by `operator>>` (fixpnt_impl.hpp:2073) but had no definition -- a latent link error that nobody had hit because no test exercises stream input on fixpnt. This PR provides the definition. Accepted syntax: [+-]? ( 0[bB][01]+ | 0[oO][0-7]+ | 0[xX][0-9A-F']+ | [0-9]+ ) Bit-pattern parsing (binary / octal / hex) fills the underlying storage MSB-first via setbit(0) + left shift -- matches the convention of setbits(). Decimal parsing is integer-only in this phase: the digit string is accumulated as an integer K and stored as `K << rbits`, so parse("5") on fixpnt<8,4> yields the value 5.0. Decimal-fraction parsing ("3.14") is deferred to Phase B2 along with the analogous float-from-string work for posit and cfloat. Tests ----- - static/integer/binary/api/string_parse.cpp -- 33 assertions covering decimal, binary, octal, hex with sign handling and rejection of malformed inputs across integer<8,16,32,64> widths. - static/fixpnt/binary/api/string_parse.cpp -- 16 assertions covering the same format matrix on fixpnt<8,4>, <9,4>, <16,8> with explicit bit-pattern verification via setbits-comparison. Both test targets pass 33/33 and 16/16 on gcc and clang. Existing integer api regression (`bint_api`) unchanged. Relates to #835 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(integer, fixpnt): address CodeRabbit review on PR #839 Four items from CodeRabbit's review of the Phase B1 PR: 1. **Actionable: fixpnt decimal branch UB + policy bypass** -- the previous implementation used `setbits(static_cast<uint64_t>(10) << rbits)` to construct 10.0 and the per-digit operand. This both (a) bypassed the Saturate / Modulo arithmetic policy that *= and += would otherwise enforce, and (b) invoked UB for any fixpnt instantiation with rbits >= 64. Replaced with FP(10) and FP(d) using the converting constructor from native int. Removed the dead one_ulp placeholder. 2. **Actionable: fixpnt hex branch silent-success on only-separator input** -- a payload of just apostrophes (e.g. "0x'", "0x''") would skip every iteration via the separator continue and exit the loop with no real digit processed but no error, leaving value zero. Same gap existed in integer's parse() (and in binary / octal branches for any future conventions that introduce separators). Added a digit_found flag in every bit-pattern branch in both files; return false if no real digit was seen. Decimal keeps the flag for symmetry though is_decimal_digit already gates the loop body. 3. **Actionable: fixpnt operator>> failure semantics** -- the istream operator merely logged to std::cerr on parse failure, leaving the stream in a "successful" state so callers (e.g. `while (in >> x)`) couldn't detect the error. Now sets failbit via istr.setstate. 4. **Nitpick: integer test type mismatch** -- check_parse<16, std::uint8_t> was invoked with I8(15) / I8(64) as the expected value (relied on implicit narrowing via the converting constructor). Added an I16 alias and used it for the 16-bit cases. Test additions exercise the new rejection paths: - "0x'", "0x''", "0x'''" -> integer and fixpnt both reject Plus the type-corrected octal tests on I16. Final counts: integer 36/36, fixpnt 18/18. Passes on gcc and clang. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(fixpnt): keep cerr diagnostic alongside the failbit on operator>> parse error Round-2 (70d0ce6) replaced the cerr log with setstate(failbit) per CodeRabbit's review of #839. CodeRabbit's prompt was explicit that the cerr message was optional, but on reflection both belong: failbit is the programmatic signal (while (in >> x) loops, etc.) and the cerr message helps interactive debugging. posit's operator>> uses the same dual-signal pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(fixpnt): rename FP alias to Fixpnt for unambiguous fixed-point semantics Per maintainer review on PR #839: the alias `FP` reads as "floating-point" to most numerics readers, which is exactly the wrong association inside a fixed-point parser. Renamed to `Fixpnt` (matching the class name) in both sites: - include/sw/universal/number/fixpnt/fixpnt_impl.hpp parse() body (the `using FP = fixpnt<...>` alias and its two consumer references) - static/fixpnt/binary/api/string_parse.cpp test helper aliases (FP8_4 -> Fixpnt8_4, FP16_8 -> Fixpnt16_8) Pure rename. No behavior change. Tests still 18/18 on gcc and clang. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(integer): symmetric failbit-on-parse-error in operator>> Mirrors the fix applied to fixpnt::operator>> in this PR. integer<>'s istream extractor previously only logged to std::cerr on parse failure, leaving the stream in a "successful" state -- a `while (in >> x)` loop would not terminate on bad input. Now sets failbit on failure (programmatic detection signal) while keeping the cerr message (interactive-debugging signal). Same dual-signal pattern as fixpnt. Drive-by: the original cerr message read "into a posit value" -- a copy-paste leftover from posit's identical extractor. Corrected to "into an integer value". bint_string_parse still 36/36 and bint_api passes; no behavior change for the success path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(integer, fixpnt): commit parse result only on success CodeRabbit's round-2 review (PR #839) flagged that the parsers cleared the output value at entry and then mutated it in place, so any mid-parse `return false` left the caller's object partially updated (e.g., zero followed by a few bits, rather than the original state). Standard practice for parse() is "leave output untouched on failure." Refactored both integer and fixpnt parse() to build into a local temporary (`Int tmp` / `Fixpnt tmp`) and only assign `value = tmp` at the very end on a fully successful parse. Every existing `return false` path now exits without touching the caller's `value`. Two new regression assertions per file verify the contract: - Hex-with-invalid-trailing-char ("0x1G") on a pre-set value leaves it at the previous value. - Decimal-then-garbage ("123abc" / "12abc") same. Both apply to integer and fixpnt for symmetry, even though CR only called out fixpnt explicitly (the integer parser had the same pattern). Final counts: integer 38/38, fixpnt 20/20 on gcc and clang. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(integer, fixpnt): adopt canonical ReportTestSuite pattern for parse() tests Replaces the homegrown g_total/g_failures bench from the original Phase B1 tests with the convention used across hundreds of existing tests in static/*/api/*: ReportTestSuiteHeader, scope-block test groups with "int start = nrOfFailedTestCases" + per-assertion ++nrOfFailedTestCases, trailing "if (nrOfFailedTestCases - start > 0) std::cout << \"FAIL: ...\"" diagnostic, and ReportTestSuiteResults at the end. Also adds the standard catch-block footer (ad-hoc / arithmetic / internal / runtime / ...). No behavior change in what is tested -- same matrix of decimal / binary / octal / hex / invalid-rejection / commit-on-success cases. Just the reporter idiom now matches the codebase. Output now reads: integer<> parse() string-parsing test suite: PASS fixpnt<> parse() string-parsing test suite: PASS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se foundation test (#840) Follow-on cleanup for PR #838 (Phase A of #835). The foundation test landed with a homegrown g_total/g_failures bench instead of the ReportTestSuiteHeader / scope-block / ReportTestSuiteResults idiom used by the rest of the Universal regression suite (and applied to the integer/fixpnt parse() tests on PR #839). Refactor mirrors that style: - ReportTestSuiteHeader at entry, ReportTestSuiteResults at exit - Each primitive (scan_prefix / scan_sign / parse_binary/octal/hex / scan_decimal_float) is one scope block: `int start = nrOfFailedTestCases` at the top, per-assertion `if (...) ++nrOfFailedTestCases`, trailing `if (nrOfFailedTestCases - start > 0) std::cout << "FAIL: ..."` diagnostic. - Standard catch footer (ad-hoc / runtime / ...) Static_assert smoke tests at file scope (proving constexpr) are kept verbatim -- they verify a different invariant (compile-time evaluation) that the runtime block can't. Same test matrix, same coverage. Output now reads: string_parse primitives (Phase A of #835) test suite: PASS Relates to #835 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#865) * feat(edecimal): extend parse to decimal-point and scientific notation Previously edecimal::parse only accepted integer literals matching [+-]*[0-9]+ via std::regex_match. Anything with a decimal point or an exponent suffix was rejected, even when the value was exactly representable as an integer ("3.14e2" = 314). Switch the tokenizer to sw::universal::string_parse::scan_decimal_float (the foundation from #838) which yields int_part, frac_part, and a signed exp10. The effective decimal exponent is exp10 - frac.size(): when non-negative, the value is an exact integer and we accept it; when negative the value has fractional digits that edecimal cannot represent without precision loss, so we reject ("3.14", "1.5e-100", "0.001"). This matches the issue spec's "preserve those digits exactly" rule. Accepted forms now include: "42", "-1000" -- integer (unchanged) "3.14e2", "-2.5e1" -- decimal point with shift to integer "1e10", "1.5e10" -- pure exponent or compatible "3.14e+200" -- 201-digit exact integer "5.", ".5e1" -- edge syntax that scan_decimal_float allows Side effects: - Call unpad() after parsing so "0042" / "0.0042e4" no longer carry leading-zero limbs. - Collapse "-0" / "-0.0e5" to +0 (no negative zero). operator>> hygiene (failbit + extraction guard) was already shipped in #858 (Phase E of #835); the test file now also pins it. Test (elastic/decimal/conversion/string_parse.cpp) extended to 9 groups: - integer parse (canonical + large) - scientific exact (with the 201-digit "3.14e+200" reference) - decimal-point form that produces an integer - fractional input rejected (3.14, -0.5, 0.001, 1.5e-100, 3.0, 10.50e0) - malformed reject (empty, alpha, "1e", ".", "1.2.3", "1e3.5", "42x", "0x1F") - negative-zero collapse - operator>> failbit on bad token - operator>> success on scientific token in whitespace - operator>> empty stream Resolves #854 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(edecimal): cap parse() expansion to prevent unbounded allocation CodeRabbit round 1 on #865 flagged a DoS surface: scan_decimal_float returns an int32 exponent, so an input like "1e2000000000" would loop push_back ~2 billion trailing zeros inside parse, allocating ~2 GiB. Cap the post-expansion digit count at 1 MiB (1,048,576 digits) and use vector::reserve + vector::insert in place of repeated push_back so the accepted path also stays O(N) without amortized growth. Test additions: - "1e2000000000", "1e2147483647" (INT32_MAX), "1e10000000", "1e1048576" (cap+1) -- all rejected. - The cap is exclusive: 1e1048575 (1 MiB significand) still parses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#866) Previously erational::parse only accepted integer literals matching [+-]*[0-9]+ and left the denominator unset (relying on the prior state, which defaulted to 1 only on a freshly constructed value). Replace parse() with a routing function and a static helper parse_decimal_to_fraction(s, num, den, neg) that uses scan_decimal_float (the foundation from #838) to tokenize either an integer, a decimal, or a scientific literal into an exact (numerator, denominator) edecimal pair: "42" -> num=42, den=1 "3.14" -> num=314, den=100 "1.5e2" -> num=150, den=1 "1.5e-1" -> num=15, den=10 For the p/q form, split on '/' and parse each half through the same helper, then combine as (p_num * q_den) / (p_den * q_num). Sign is the XOR of the two sides. Mixed forms like "3.14/2" and "1e2/2e1" work because each side is independently a decimal/scientific literal. "1/2" -> 1/2 "-22/7" -> -22/7 "22/-7" -> -22/7 "4/8" -> 1/2 (normalize() reduces via GCD) "3.14/2" -> 157/100 Rejected forms: - q == 0 across all flavors: "1/0", "5/0.0", "0/0", "1/0e10" (erational has no NaR encoding, and silently representing infinity would mask downstream divide-by-zero detection) - Two slashes: "1/2/3" - Empty side: "1/", "/2", "/" - Malformed decimal: "3.14.15", "1e", ".", "42x" Defensive cap: parse_decimal_to_fraction rejects any input whose significand or denominator would exceed 2^20 (1,048,576) digits, the same cap used by edecimal::parse (#854). operator>> hygiene (failbit + extraction guard) was already shipped in #858 (Phase E of #835); the test file now also pins it. Test (elastic/rational/decimal/conversion/string_parse.cpp) extended to 11 groups: integer, p/q with simplification, decimal-to-rational, scientific, mixed p/q with decimal sides, q=0 rejection, malformed, negative-zero collapse, operator>> failbit on bad, operator>> on a p/q token in whitespace, operator>> empty stream. Resolves #855 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
include/sw/universal/utility/string_parse.hpp-- a shared, constexpr-friendly foundation for the string-parsing APIs that each number system will adopt in later phases of issue feat: create a decimal string parsing API for all number systems #835.Why this needs a foundation first
A pre-implementation survey (see issue #835 for the roadmap) found that today's
parse()implementations are reimplemented per type, with divergent format coverage:lnsrefuses decimal value strings whilefixpntaccepts thempositaccepts a customnbits.esXhexvalueform not found anywhere elsedfloataccepts decimal but not binary bit-patterns;lnsis the oppositeWithout a shared scanner, every type that adopts string parsing will continue this divergence. Phase A fixes that.
What's in the header
sw::universal::string_parsenamespace:scan_prefix(string_view){number_base, body}scan_sign(string_view){negative, rest}parse_binary/octal/hex(string_view){value, digits, overflow, valid}uint64_taccumulation with overflow detectionscan_decimal_float(string_view){valid, negative, int_part, frac_part, exp10}[-+]?int.frac[eE][-+]?exptokenized to string_views over inputPlus character classifiers (
is_decimal_digit,is_hex_digit, ...) and ahex_digit_valuehelper.Scope (what's explicitly out)
Phase A does NOT perform value reconstruction (turning digit strings into a number type's bit representation). That step depends on the target type's precision and rounding rules and so belongs in each type's own
parse()in Phase B and later.scan_decimal_floatreturnsstring_views pointing into the input, plus a signedint32_texponent -- so the caller can iterate the digits and accumulate into whatever representation it wants (uint64_t, multi-limbblockbinary, decimal accumulator, ...).Tests
static/utility/test_string_parse.cpp-- 57 test assertions + 8static_assertsmoke tests proving constexpr.scan_prefix: 0b/0B/0o/0O/0x/0X case variants, decimal default, leading-0-without-prefix, empty inputscan_sign: '+', '-', no sign, empty, double-sign (first only consumed)parse_binary/octal/hex: valid input, partial-stop-at-invalid-char, 64-bit boundary (no overflow), 65/17-digit overflow detection, empty -> invalidscan_decimal_float: integer-only, fraction-only, integer+fraction, positive/negative exponent, mixed signs, malformed (bare dot, trailing garbage,"1e"no exp digits), 50-digit fraction (constexpr-friendly: views point into input, no allocation)Test results
utility_test_string_parsePhased roadmap (for context; only Phase A is in this PR)
Test plan
Relates to #835
Generated with Claude Code
Summary by CodeRabbit
New Features
Tests